Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 1623 |
| Missing cells | 814 |
| Missing cells (%) | 3.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 165.0 KiB |
| Average record size in memory | 104.1 B |
Variable types
| NUM | 8 |
|---|---|
| CAT | 4 |
| BOOL | 1 |
Reproduction
| Analysis started | 2020-07-10 12:35:43.001078 |
|---|---|
| Analysis finished | 2020-07-10 12:35:56.563166 |
| Duration | 13.56 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
date_of_establishment has a high cardinality: 627 distinct values | High cardinality |
location has a high cardinality: 693 distinct values | High cardinality |
loc.details has a high cardinality: 180 distinct values | High cardinality |
location.Code is highly correlated with id | High correlation |
id is highly correlated with location.Code | High correlation |
deposit_amount_2011 is highly correlated with headquarter and 5 other fields | High correlation |
headquarter is highly correlated with deposit_amount_2011 and 5 other fields | High correlation |
deposit_amount_2012 is highly correlated with headquarter and 5 other fields | High correlation |
deposit_amount_2013 is highly correlated with headquarter and 5 other fields | High correlation |
deposit_amount_2014 is highly correlated with headquarter and 5 other fields | High correlation |
deposit_amount_2015 is highly correlated with headquarter and 5 other fields | High correlation |
deposit_amount_2016 is highly correlated with headquarter and 5 other fields | High correlation |
date_of_establishment has 814 (50.2%) missing values | Missing |
deposit_amount_2011 is highly skewed (γ1 = 39.81107704) | Skewed |
deposit_amount_2012 is highly skewed (γ1 = 39.64549677) | Skewed |
deposit_amount_2013 is highly skewed (γ1 = 39.49573447) | Skewed |
deposit_amount_2014 is highly skewed (γ1 = 39.49528758) | Skewed |
deposit_amount_2015 is highly skewed (γ1 = 39.28702403) | Skewed |
deposit_amount_2016 is highly skewed (γ1 = 39.62477434) | Skewed |
id has unique values | Unique |
location.Code has unique values | Unique |
deposit_amount_2011 has 91 (5.6%) zeros | Zeros |
deposit_amount_2012 has 92 (5.7%) zeros | Zeros |
deposit_amount_2013 has 92 (5.7%) zeros | Zeros |
deposit_amount_2014 has 92 (5.7%) zeros | Zeros |
deposit_amount_2015 has 92 (5.7%) zeros | Zeros |
deposit_amount_2016 has 92 (5.7%) zeros | Zeros |
| Distinct count | 1623 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 812.0 |
|---|---|
| Minimum | 1 |
| Maximum | 1623 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 82.1 |
| Q1 | 406.5 |
| median | 812 |
| Q3 | 1217.5 |
| 95-th percentile | 1541.9 |
| Maximum | 1623 |
| Range | 1622 |
| Interquartile range (IQR) | 811 |
Descriptive statistics
| Standard deviation | 468.6640588 |
|---|---|
| Coefficient of variation (CV) | 0.5771724862 |
| Kurtosis | -1.2 |
| Mean | 812 |
| Median Absolute Deviation (MAD) | 406 |
| Skewness | 0 |
| Sum | 1317876 |
| Variance | 219646 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1623 | 1 | 0.1% | |
| 1088 | 1 | 0.1% | |
| 1068 | 1 | 0.1% | |
| 1070 | 1 | 0.1% | |
| 1072 | 1 | 0.1% | |
| 1074 | 1 | 0.1% | |
| 1076 | 1 | 0.1% | |
| 1078 | 1 | 0.1% | |
| 1080 | 1 | 0.1% | |
| 1082 | 1 | 0.1% | |
| Other values (1613) | 1613 | 99.4% |
| Value | Count | Frequency (%) | |
| 1 | 1 | 0.1% | |
| 2 | 1 | 0.1% | |
| 3 | 1 | 0.1% | |
| 4 | 1 | 0.1% | |
| 5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1623 | 1 | 0.1% | |
| 1622 | 1 | 0.1% | |
| 1621 | 1 | 0.1% | |
| 1620 | 1 | 0.1% | |
| 1619 | 1 | 0.1% |
| Distinct count | 2 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.7 KiB |
| 0 | |
|---|---|
| 1 | 1 |
| Value | Count | Frequency (%) | |
| 0 | 1622 | 99.9% | |
| 1 | 1 | 0.1% |
| Distinct count | 1623 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1770.3148490449785 |
|---|---|
| Minimum | 5 |
| Maximum | 2870 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 5 |
|---|---|
| 5-th percentile | 209.3 |
| Q1 | 1333 |
| median | 1834 |
| Q3 | 2391.5 |
| 95-th percentile | 2780.9 |
| Maximum | 2870 |
| Range | 2865 |
| Interquartile range (IQR) | 1058.5 |
Descriptive statistics
| Standard deviation | 751.636198 |
|---|---|
| Coefficient of variation (CV) | 0.424577695 |
| Kurtosis | -0.4691148652 |
| Mean | 1770.314849 |
| Median Absolute Deviation (MAD) | 534 |
| Skewness | -0.582501805 |
| Sum | 2873221 |
| Variance | 564956.9742 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2047 | 1 | 0.1% | |
| 1330 | 1 | 0.1% | |
| 1302 | 1 | 0.1% | |
| 1306 | 1 | 0.1% | |
| 1310 | 1 | 0.1% | |
| 1312 | 1 | 0.1% | |
| 1314 | 1 | 0.1% | |
| 1318 | 1 | 0.1% | |
| 1320 | 1 | 0.1% | |
| 1324 | 1 | 0.1% | |
| Other values (1613) | 1613 | 99.4% |
| Value | Count | Frequency (%) | |
| 5 | 1 | 0.1% | |
| 7 | 1 | 0.1% | |
| 8 | 1 | 0.1% | |
| 9 | 1 | 0.1% | |
| 10 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 2870 | 1 | 0.1% | |
| 2869 | 1 | 0.1% | |
| 2868 | 1 | 0.1% | |
| 2866 | 1 | 0.1% | |
| 2865 | 1 | 0.1% |
| Distinct count | 627 |
|---|---|
| Unique (%) | 77.5% |
| Missing | 814 |
| Missing (%) | 50.2% |
| Memory size | 12.7 KiB |
| 1801-01-01 | 28 |
|---|---|
| 1998-01-07 | 22 |
| 1920-01-01 | 9 |
| 1888-01-01 | 8 |
| 1906-01-01 | 7 |
| Other values (622) |
| Value | Count | Frequency (%) | |
| 1801-01-01 | 28 | 1.7% | |
| 1998-01-07 | 22 | 1.4% | |
| 1920-01-01 | 9 | 0.6% | |
| 1888-01-01 | 8 | 0.5% | |
| 1906-01-01 | 7 | 0.4% | |
| 1935-01-05 | 7 | 0.4% | |
| 1900-01-01 | 7 | 0.4% | |
| 1890-01-01 | 6 | 0.4% | |
| 1926-01-01 | 5 | 0.3% | |
| 1999-09-03 | 5 | 0.3% | |
| Other values (617) | 705 | 43.4% | |
| (Missing) | 814 | 50.2% |
Length
| Max length | 10 |
|---|---|
| Median length | 3 |
| Mean length | 6.489217498 |
| Min length | 3 |
| Distinct count | 693 |
|---|---|
| Unique (%) | 42.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.7 KiB |
| New York City | 78 |
|---|---|
| Houston | 67 |
| Brooklyn | 37 |
| Dallas | 33 |
| Phoenix | 31 |
| Other values (688) |
| Value | Count | Frequency (%) | |
| New York City | 78 | 4.8% | |
| Houston | 67 | 4.1% | |
| Brooklyn | 37 | 2.3% | |
| Dallas | 33 | 2.0% | |
| Phoenix | 31 | 1.9% | |
| Bronx | 31 | 1.9% | |
| Tucson | 27 | 1.7% | |
| Columbus | 25 | 1.5% | |
| Baton Rouge | 24 | 1.5% | |
| Austin | 22 | 1.4% | |
| Other values (683) | 1248 | 76.9% |
Length
| Max length | 19 |
|---|---|
| Median length | 8 |
| Mean length | 8.854590265 |
| Min length | 3 |
| Distinct count | 180 |
|---|---|
| Unique (%) | 11.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.7 KiB |
| Maricopa | 84 |
|---|---|
| Harris | 77 |
| New York | 76 |
| Cook | 65 |
| Dallas | 60 |
| Other values (175) |
| Value | Count | Frequency (%) | |
| Maricopa | 84 | 5.2% | |
| Harris | 77 | 4.7% | |
| New York | 76 | 4.7% | |
| Cook | 65 | 4.0% | |
| Dallas | 60 | 3.7% | |
| Wayne | 58 | 3.6% | |
| Queens | 41 | 2.5% | |
| Kings | 38 | 2.3% | |
| Franklin | 37 | 2.3% | |
| Westchester | 36 | 2.2% | |
| Other values (170) | 1051 | 64.8% |
Length
| Max length | 20 |
|---|---|
| Median length | 7 |
| Mean length | 6.964879852 |
| Min length | 3 |
state
Categorical
| Distinct count | 14 |
|---|---|
| Unique (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 12.7 KiB |
| NY | |
|---|---|
| TX | |
| OH | |
| MI | |
| AZ | |
| Other values (9) |
| Value | Count | Frequency (%) | |
| NY | 339 | 20.9% | |
| TX | 282 | 17.4% | |
| OH | 252 | 15.5% | |
| MI | 208 | 12.8% | |
| AZ | 144 | 8.9% | |
| LA | 140 | 8.6% | |
| IL | 119 | 7.3% | |
| NJ | 39 | 2.4% | |
| CT | 28 | 1.7% | |
| WV | 26 | 1.6% | |
| Other values (4) | 46 | 2.8% |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
| Distinct count | 1520 |
|---|---|
| Unique (%) | 93.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 836240.7791127542 |
|---|---|
| Minimum | 0.0 |
| Maximum | 949696500.0 |
| Zeros | 91 |
| Zeros (%) | 5.6% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 50784.75 |
| median | 98578.5 |
| Q3 | 181356.75 |
| 95-th percentile | 461689.95 |
| Maximum | 949696500 |
| Range | 949696500 |
| Interquartile range (IQR) | 130572 |
Descriptive statistics
| Standard deviation | 23664391.51 |
|---|---|
| Coefficient of variation (CV) | 28.29853805 |
| Kurtosis | 1596.53699 |
| Mean | 836240.7791 |
| Median Absolute Deviation (MAD) | 56863.5 |
| Skewness | 39.81107704 |
| Sum | 1357218784 |
| Variance | 5.600034254e+14 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 91 | 5.6% | |
| 96172.5 | 2 | 0.1% | |
| 61548 | 2 | 0.1% | |
| 31500 | 2 | 0.1% | |
| 181585.5 | 2 | 0.1% | |
| 81405 | 2 | 0.1% | |
| 126115.5 | 2 | 0.1% | |
| 67957.5 | 2 | 0.1% | |
| 41809.5 | 2 | 0.1% | |
| 154339.5 | 2 | 0.1% | |
| Other values (1510) | 1514 | 93.3% |
| Value | Count | Frequency (%) | |
| 0 | 91 | 5.6% | |
| 1.5 | 1 | 0.1% | |
| 247.5 | 1 | 0.1% | |
| 2107.5 | 1 | 0.1% | |
| 5209.5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 949696500 | 1 | 0.1% | |
| 67648614 | 1 | 0.1% | |
| 39534582 | 1 | 0.1% | |
| 33151629 | 1 | 0.1% | |
| 10430034 | 1 | 0.1% |
| Distinct count | 1527 |
|---|---|
| Unique (%) | 94.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 984752.9593345657 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1114902000.0 |
| Zeros | 92 |
| Zeros (%) | 5.7% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 53303.25 |
| median | 103470 |
| Q3 | 193895.25 |
| 95-th percentile | 481739.85 |
| Maximum | 1114902000 |
| Range | 1114902000 |
| Interquartile range (IQR) | 140592 |
Descriptive statistics
| Standard deviation | 27822195.32 |
|---|---|
| Coefficient of variation (CV) | 28.25296949 |
| Kurtosis | 1587.10057 |
| Mean | 984752.9593 |
| Median Absolute Deviation (MAD) | 60190.5 |
| Skewness | 39.64549677 |
| Sum | 1598254053 |
| Variance | 7.740745524e+14 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 27235.5 | 2 | 0.1% | |
| 70248 | 2 | 0.1% | |
| 48249 | 2 | 0.1% | |
| 66639 | 2 | 0.1% | |
| 68136 | 2 | 0.1% | |
| 59295 | 1 | 0.1% | |
| 32187 | 1 | 0.1% | |
| 488785.5 | 1 | 0.1% | |
| 202921.5 | 1 | 0.1% | |
| Other values (1517) | 1517 | 93.5% |
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 237 | 1 | 0.1% | |
| 2020.5 | 1 | 0.1% | |
| 3277.5 | 1 | 0.1% | |
| 4963.5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1114902000 | 1 | 0.1% | |
| 93946476 | 1 | 0.1% | |
| 55487037 | 1 | 0.1% | |
| 41281806 | 1 | 0.1% | |
| 15721390.5 | 1 | 0.1% |
| Distinct count | 1524 |
|---|---|
| Unique (%) | 93.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1107470.495378928 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1248682500.0 |
| Zeros | 92 |
| Zeros (%) | 5.7% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 57223.5 |
| median | 112302 |
| Q3 | 207204 |
| 95-th percentile | 528510.3 |
| Maximum | 1248682500 |
| Range | 1248682500 |
| Interquartile range (IQR) | 149980.5 |
Descriptive statistics
| Standard deviation | 31203622.44 |
|---|---|
| Coefficient of variation (CV) | 28.17557901 |
| Kurtosis | 1578.406914 |
| Mean | 1107470.495 |
| Median Absolute Deviation (MAD) | 64885.5 |
| Skewness | 39.49573447 |
| Sum | 1797424614 |
| Variance | 9.736660537e+14 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 173326.5 | 2 | 0.1% | |
| 137647.5 | 2 | 0.1% | |
| 127350 | 2 | 0.1% | |
| 34819.5 | 2 | 0.1% | |
| 46689 | 2 | 0.1% | |
| 136798.5 | 2 | 0.1% | |
| 53697 | 2 | 0.1% | |
| 77494.5 | 2 | 0.1% | |
| 108627 | 1 | 0.1% | |
| Other values (1514) | 1514 | 93.3% |
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 243 | 1 | 0.1% | |
| 1920 | 1 | 0.1% | |
| 4131 | 1 | 0.1% | |
| 5425.5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1248682500 | 1 | 0.1% | |
| 122940568.5 | 1 | 0.1% | |
| 58192191 | 1 | 0.1% | |
| 54329896.5 | 1 | 0.1% | |
| 18817225.5 | 1 | 0.1% |
| Distinct count | 1529 |
|---|---|
| Unique (%) | 94.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1220472.520332717 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1374814500.0 |
| Zeros | 92 |
| Zeros (%) | 5.7% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 61818.75 |
| median | 120445.5 |
| Q3 | 226321.5 |
| 95-th percentile | 578735.85 |
| Maximum | 1374814500 |
| Range | 1374814500 |
| Interquartile range (IQR) | 164502.75 |
Descriptive statistics
| Standard deviation | 34354845.66 |
|---|---|
| Coefficient of variation (CV) | 28.14880719 |
| Kurtosis | 1578.512689 |
| Mean | 1220472.52 |
| Median Absolute Deviation (MAD) | 71122.5 |
| Skewness | 39.49528758 |
| Sum | 1980826900 |
| Variance | 1.18025542e+15 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 27117 | 2 | 0.1% | |
| 136974 | 2 | 0.1% | |
| 76243.5 | 2 | 0.1% | |
| 253951.5 | 1 | 0.1% | |
| 53421 | 1 | 0.1% | |
| 55195.5 | 1 | 0.1% | |
| 163495.5 | 1 | 0.1% | |
| 172339.5 | 1 | 0.1% | |
| 105888 | 1 | 0.1% | |
| Other values (1519) | 1519 | 93.6% |
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 208.5 | 1 | 0.1% | |
| 3394.5 | 1 | 0.1% | |
| 4162.5 | 1 | 0.1% | |
| 4657.5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1374814500 | 1 | 0.1% | |
| 127766710.5 | 1 | 0.1% | |
| 78427267.5 | 1 | 0.1% | |
| 58728975 | 1 | 0.1% | |
| 21045706.5 | 1 | 0.1% |
| Distinct count | 1525 |
|---|---|
| Unique (%) | 94.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1388776.3086876154 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1548823500.0 |
| Zeros | 92 |
| Zeros (%) | 5.7% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 64618.5 |
| median | 127141.5 |
| Q3 | 238721.25 |
| 95-th percentile | 614772.45 |
| Maximum | 1548823500 |
| Range | 1548823500 |
| Interquartile range (IQR) | 174102.75 |
Descriptive statistics
| Standard deviation | 38776096.5 |
|---|---|
| Coefficient of variation (CV) | 27.9210527 |
| Kurtosis | 1566.640917 |
| Mean | 1388776.309 |
| Median Absolute Deviation (MAD) | 75238.5 |
| Skewness | 39.28702403 |
| Sum | 2253983949 |
| Variance | 1.50358566e+15 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 95983.5 | 2 | 0.1% | |
| 67686 | 2 | 0.1% | |
| 73122 | 2 | 0.1% | |
| 65986.5 | 2 | 0.1% | |
| 216211.5 | 2 | 0.1% | |
| 336804 | 2 | 0.1% | |
| 127141.5 | 2 | 0.1% | |
| 132466.5 | 1 | 0.1% | |
| 179373 | 1 | 0.1% | |
| Other values (1515) | 1515 | 93.3% |
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 199.5 | 1 | 0.1% | |
| 3805.5 | 1 | 0.1% | |
| 4023 | 1 | 0.1% | |
| 5275.5 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1548823500 | 1 | 0.1% | |
| 147483243 | 1 | 0.1% | |
| 123612354 | 1 | 0.1% | |
| 70655661 | 1 | 0.1% | |
| 27182524.5 | 1 | 0.1% |
| Distinct count | 1526 |
|---|---|
| Unique (%) | 94.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1412397.887245841 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1604137500.0 |
| Zeros | 92 |
| Zeros (%) | 5.7% |
| Memory size | 12.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 69571.5 |
| median | 134331 |
| Q3 | 258485.25 |
| 95-th percentile | 648135.45 |
| Maximum | 1604137500 |
| Range | 1604137500 |
| Interquartile range (IQR) | 188913.75 |
Descriptive statistics
| Standard deviation | 40037728.32 |
|---|---|
| Coefficient of variation (CV) | 28.34734368 |
| Kurtosis | 1586.037873 |
| Mean | 1412397.887 |
| Median Absolute Deviation (MAD) | 79812 |
| Skewness | 39.62477434 |
| Sum | 2292321771 |
| Variance | 1.603019689e+15 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 100440 | 2 | 0.1% | |
| 89076 | 2 | 0.1% | |
| 105454.5 | 2 | 0.1% | |
| 21832.5 | 2 | 0.1% | |
| 160029 | 2 | 0.1% | |
| 68163 | 2 | 0.1% | |
| 205990.5 | 1 | 0.1% | |
| 70015.5 | 1 | 0.1% | |
| 125809.5 | 1 | 0.1% | |
| Other values (1516) | 1516 | 93.4% |
| Value | Count | Frequency (%) | |
| 0 | 92 | 5.7% | |
| 177 | 1 | 0.1% | |
| 3042 | 1 | 0.1% | |
| 3943.5 | 1 | 0.1% | |
| 5283 | 1 | 0.1% |
| Value | Count | Frequency (%) | |
| 1604137500 | 1 | 0.1% | |
| 126853557 | 1 | 0.1% | |
| 91053255 | 1 | 0.1% | |
| 64990429.5 | 1 | 0.1% | |
| 28283470.5 | 1 | 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| id | headquarter | location.Code | date_of_establishment | location | loc.details | state | deposit_amount_2011 | deposit_amount_2012 | deposit_amount_2013 | deposit_amount_2014 | deposit_amount_2015 | deposit_amount_2016 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 5 | 1824-12-31 | Columbus | Delaware | OH | 949696500.0 | 1.114902e+09 | 1.248682e+09 | 1.374814e+09 | 1.548824e+09 | 1.604138e+09 |
| 1 | 2 | 0 | 7 | NaN | Scarsdale | Westchester | NY | 439843.5 | 4.661865e+05 | 4.886130e+05 | 4.918950e+05 | 4.916880e+05 | 5.122125e+05 |
| 2 | 3 | 0 | 8 | 1964-09-08 | Great Neck | Nassau | NY | 286516.5 | 3.103995e+05 | 3.246585e+05 | 3.569745e+05 | 3.512745e+05 | 3.936825e+05 |
| 3 | 4 | 0 | 9 | NaN | Hartsdale | Westchester | NY | 130665.0 | 1.325505e+05 | 1.397445e+05 | 1.644885e+05 | 1.679775e+05 | 1.751580e+05 |
| 4 | 5 | 0 | 10 | NaN | Lawrence | Nassau | NY | 258912.0 | 2.591235e+05 | 2.841195e+05 | 2.976675e+05 | 3.077970e+05 | 3.348000e+05 |
| 5 | 6 | 0 | 14 | NaN | Mount Vernon | Westchester | NY | 220230.0 | 2.050080e+05 | 2.110170e+05 | 2.314695e+05 | 2.230605e+05 | 2.182485e+05 |
| 6 | 7 | 0 | 17 | 1966-11-12 | Bronx | Bronx | NY | 112696.5 | 1.202580e+05 | 1.234995e+05 | 1.418070e+05 | 1.455690e+05 | 1.607490e+05 |
| 7 | 8 | 0 | 20 | NaN | Bronx | Bronx | NY | 59832.0 | 6.381900e+04 | 6.570000e+04 | 6.880050e+04 | 7.704450e+04 | 8.503950e+04 |
| 8 | 9 | 0 | 21 | NaN | Bronx | Bronx | NY | 110553.0 | 1.050735e+05 | 1.056705e+05 | 1.184190e+05 | 1.210155e+05 | 1.241340e+05 |
| 9 | 10 | 0 | 23 | NaN | Bronx | Bronx | NY | 104667.0 | 1.092240e+05 | 1.120935e+05 | 1.132050e+05 | 1.181385e+05 | 1.259670e+05 |
Last rows
| id | headquarter | location.Code | date_of_establishment | location | loc.details | state | deposit_amount_2011 | deposit_amount_2012 | deposit_amount_2013 | deposit_amount_2014 | deposit_amount_2015 | deposit_amount_2016 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1613 | 1614 | 0 | 2860 | NaN | Fox Point | Milwaukee | WI | 196357.5 | 212155.5 | 227674.5 | 282112.5 | 198720.0 | 212781.0 |
| 1614 | 1615 | 0 | 2861 | 1922-01-01 | Milwaukee | Milwaukee | WI | 30301.5 | 33112.5 | 38347.5 | 39847.5 | 43236.0 | 43119.0 |
| 1615 | 1616 | 0 | 2862 | NaN | Milwaukee | Milwaukee | WI | 56086.5 | 58680.0 | 62710.5 | 71485.5 | 73122.0 | 76455.0 |
| 1616 | 1617 | 0 | 2863 | NaN | Milwaukee | Milwaukee | WI | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 1617 | 1618 | 0 | 2864 | 1913-08-05 | Milwaukee | Milwaukee | WI | 53412.0 | 55384.5 | 61980.0 | 62097.0 | 63099.0 | 67599.0 |
| 1618 | 1619 | 0 | 2865 | 1910-02-10 | Cudahy | Milwaukee | WI | 103951.5 | 133564.5 | 138643.5 | 150294.0 | 159280.5 | 152766.0 |
| 1619 | 1620 | 0 | 2866 | NaN | Wauwatosa | Milwaukee | WI | 98406.0 | 105657.0 | 114579.0 | 124258.5 | 139989.0 | 150336.0 |
| 1620 | 1621 | 0 | 2868 | NaN | Mequon | Ozaukee | WI | 83460.0 | 86874.0 | 98116.5 | 124689.0 | 126501.0 | 137949.0 |
| 1621 | 1622 | 0 | 2869 | NaN | Delafield | Waukesha | WI | 81405.0 | 89365.5 | 98139.0 | 93705.0 | 120355.5 | 122323.5 |
| 1622 | 1623 | 0 | 2870 | NaN | Eagle | Waukesha | WI | 25537.5 | 25537.5 | 28282.5 | 30828.0 | 35551.5 | 35727.0 |